02. The Big Picture
The Big Picture
Before digging into the details of policy gradient methods, we'll discuss how they work at a high level.
M3L3 C02 V6
## Quiz
SOLUTION:
- For each episode, if the agent won the game, we'll amend the policy network weights to make each (state, action) pair that appeared in the episode to make them more likely to repeat in future episodes.
- For each episode, if the agent lost the game, we'll change the policy network weights to make it less likely to repeat the corresponding (state, action) pairs in future episodes.